For each panel, the visual idea is in italics below each title, panel text in the quote, and the image contents below that. Images are svg vector graphics with transparent backgrounds.
Establishing shot of for example the outside of the classroom looking in with a creative twist on the “two groups” theme from the text. (looking at two very different classrooms?)
Imagine that we are doing research on two groups. Our null hypothesis \(H_0\) is that the group means are equal, our alternative hypothesis \(H_1\) is that they differ.
The first image of Sunday in front of the blackboard/whiteboard
If \(H_0\) is true, all possible p-values are equally likely. We say that it is uniformly distributed.
par(mar = c(4,5,3,1))
plot(x, dbeta(x, 1, 1),
type = "l", xlim = c(0,1), ylim = c(0,6), bty = "L",
axes = F, ylab = "", xlab = "",
col = "dark blue", lwd = 2, cex = 1.7)
axis(2, at = c(0,1,6), lwd = 2, cex = 1.7, las=1)
axis(1, at = c(0,1), lwd = 2, cex = 1.7)
mtext(side= 3, expression(bold("No Effect")), cex=1.5)
mtext(side = 2, expression(italic("f")*"("*italic("p")*"|H)"), line = 1, las=1, cex = 1.5)
mtext(side=1, expression(italic("p")*"-value"), line=1.5, cex=1.5)
A (slightly different?) image of Sunday in front of the blackboard/whiteboard. This image could be wide to accommodate the two plots side-by-side
Instead, when \(H_1\) is true, lower p-values are more likely than higher ones. By studying p-values, Sellke, Bayarri, and Berger (2001) showed that they are roughly beta distributed. Let’s look at the most interesting part of the \(p\)-value distribution: the lowest values from 0 to 0.2.
# Small effect
par(mar = c(4,5,3,1))
plot(x2, dbeta(x2, 0.8, 1),
type = "l", xlim = c(0,0.2), ylim = c(0,6), bty = "L",
axes = F, ylab = "", xlab = "",
col = "dark green", lwd = 2)
lines(x = c(0,0.2), y = c(1,1), lty = 3, col = "dark blue", lwd = 1)
axis(2, at = c(0,1,6), lwd = 2, cex = 1.7, las=1)
axis(1, at = c(0,0.2), lwd = 2, cex = 1.7)
text(x = 0.02, y=2.3, expression(H[1]: italic(p) ~ "~ Beta("*xi*"=0.8, 1)"), cex = 1.3, col = "dark green", pos=4)
mtext(side= 3, expression(bold("Small Effect")), cex=1.5)
mtext(side = 2, expression(italic("f")*"("*italic("p")*"|H)"), line = 1, las=1, cex = 1.5)
mtext(side=1, expression(italic("p")*"-value"), line=1.5, cex=1.5)
# Large effect
par(mar = c(4,5,3,1))
plot(x2, dbeta(x2, 0.1, 1),
type = "l", xlim = c(0,0.2), ylim = c(0,6), bty = "L",
axes = F, ylab = "", xlab = "",
col = "dark green", lwd = 2, cex = 4)
lines(x = c(0,0.2), y = c(1,1), lty = 3, col = "dark blue", lwd = 1)
axis(2, at = c(0,1,6), lwd = 2, cex = 1.7, las=1)
axis(1, at = c(0,0.2), lwd = 2, cex = 1.7)
text(x = 0.033, y=3, expression(H[1]: italic(p) ~ "~ Beta("*xi*"=0.1, 1)"), cex = 1.3, col = "dark green", pos=4)
mtext(side= 3, expression(bold("Large Effect")), cex=1.5)
mtext(side = 2, expression(italic("f")*"("*italic("p")*"|H)"), line = 1, las=1, cex = 1.5)
mtext(side=1, expression(italic("p")*"-value"), line=1.5, cex=1.5)
The diagnosticity of a p-value is to what extent that p-value is more likely under the alternative hypothesis \(H_1\) than under the null hypothesis \(H_0\). To understand this better, let’s consider two “worlds”:
See the panel idea in the drawing below. Here, \(\mathcal{N}\) means normal distribution.
in world A there is a medium effect and in B there is no effect. We don’t know which “world” we live in, but in our statistical test we obtain a p-value of 0.05 – a “significant” difference. How much more likely is it that we receive this p-value from world A than from B? We can use the p-value distributions!
par(mar = c(4,5,3,1))
plot(x2, dbeta(x2, 0.8, 1),
type = "l", xlim = c(0,0.2), ylim = c(0,6), bty = "L",
axes = F, ylab = "", xlab ="",
col = "dark green", lwd = 2, cex = 4)
arrows(0.05, -0.5, 0.05, 0.85, length = 0.1, angle = 30, lwd = 2, code = 2, col = "red")
axis(2, at = c(0,1,round(dbeta(0.05,0.8,1),1),6), lwd = 2, cex = 1.7,las=1)
axis(1, at = c(0,0.05,0.2), lwd = 2, cex = 1.7)
lines(x = c(0.05,0.05), y = c(1,dbeta(0.05,0.8,1)), lwd = 1.5)
lines(x = c(0,0.2), y = c(1,1), lty = 3, col = "dark blue", lwd = 2)
lines(x = c(0,0.05), y = rep(dbeta(0.05,0.8,1),2), lty = 3, col = "dark green", lwd = 2)
points(x = c(0.05,0.05),
y = c(1, dbeta(0.05,0.8,1)),
pch = 21, bg = "grey", cex=1.5, lwd=1.5)
text(x = 0.03, y = 4.5,labels = "World A", col = "dark green", cex = 1.5)
text(x = 0.14, y = 0.7,labels = "World B", col = "dark blue", cex = 1.5)
mtext(side= 3, expression(bold("We observe p=0.05,")), cex=1.5, line=1.5)
mtext(side= 3, expression(bold("which world do inhabit?")), cex=1.5, line=0)
mtext(side = 2, expression(italic("f")*"("*italic("p")*"|H)"), line = 1, las=1, cex = 1.5)
mtext(side=1, expression(italic("p")*"-value"), line=1.5, cex=1.5)
Closeup of Sunday’s face (with in the background still this image?
Because the y-axis in this plot indicates the probability of obtaining a certain p-value, we can immediately see the answer by looking at the ratio of probabilities: with a p-value of 0.05 it is 1.5 more likely that we are in world A. That does not seem such strong evidence in favour of world A!
Class/student image asking a question, possibly with raised hand
But teacher, we could only calculate this because we knew the true effect in world A! In the real world, we don’t know the true effect under \(H_1\), so we can’t draw the green line!
Back to Sunday (fun twist, maybe she can actually pick cherries?)
Correct, so we cheat by cherry-picking the effect size that makes \(H_1\) look as good as it can be. If we do this, we obtain the maximum p-ratio – the MPR:
Whiteboard image again, perhaps Sunday pointing at the 2.5 in the image?
See? The maximum diagnosticity of a p-value of 0.05 is just under 2.5. That means that this p-value is a bit less than 2.5 times more likely to appear under \(H_1\) than under the null hypothesis.
par(mar = c(4,5,3,1))
plot(x2, dbeta(x2, 0.33, 1),
type = "l", xlim = c(0,0.2), ylim = c(0,6), bty = "L",
axes = F, ylab = "", xlab = "",
col = "dark green", lwd = 2, cex = 4)
lines(x2, dbeta(x2,0.8,1), col = "dark green", lty = 3)
lines(x2, dbeta(x2,0.7,1), col = "dark green", lty = 3)
lines(x2, dbeta(x2,0.4,1), col = "dark green", lty = 3)
lines(x2, dbeta(x2,0.2,1), col = "dark green", lty = 3)
lines(x2, dbeta(x2,0.15,1), col = "dark green", lty = 3)
arrows(0.05, -0.5, 0.05, 0.85, length = 0.1, angle = 30, lwd = 2, code = 2, col = "red")
axis(2, at = c(0,1,round(dbeta(0.05,0.33,1),1),6), lwd = 2, cex = 1.7,las=1)
axis(1, at = c(0,0.05,0.2), lwd = 2, cex = 1.7)
lines(x = c(0.05,0.05), y = c(1,dbeta(0.05,0.33,1)), lwd = 1.5)
lines(x = c(0,0.2), y = c(1,1), lty = 3, col = "dark blue", lwd = 2)
lines(x = c(0,0.05), y = rep(dbeta(0.05,0.33,1),2), lty = 3, col = "dark green", lwd = 2)
points(x = c(0.05,0.05),
y = c(1, dbeta(0.05,0.33,1)),
pch = 21, bg = "grey", cex=1.5, lwd=1.5)
text(x = 0.03, y = 4.5,labels = expression(H[1]: italic(p) ~ "~ Beta("*xi*"=0.33, 1)"), col = "dark green", cex = 1.3, pos = 4)
text(x = 0.06, y = 0.6,labels = expression(H[0]: italic(p) ~ "~ Beta(1, 1)"), col = "dark blue", cex = 1.3, pos=4)
mtext(side= 3, expression(bold("Maximum diagnosticity:")), cex=1.5, line = 1.5)
mtext(side=3,expression(bold("maximum p-ratio")), cex=1.5, line = 0)
mtext(side = 2, expression(italic("f")*"("*italic("p")*"|H)"), line = 1, las=1, cex = 1.5, padj=-2)
mtext(side=1, expression(italic("p")*"-value"), line=1.5, cex=1.5)
Sunday drawing math on the whiteboard
Sellke, Bayarri and Berger showed that the MPR can be calculated like this for low \(p\)-values. This formula was first derived by Vladimir Vovk, so it’s called the “Vovk-Sellke Maximum p-Ratio”. For \(p=0.05\) MPR equals 2.456!
\[\mathrm{MPR}=(-e p \ln p)^{-1} = 2.456\]
Screenshot of JASP, maybe Sunday behind a computer?
JASP has an option to show the MPR for each p-value you obtain!
(Different?) student asking a critical question, looking confused?
But teacher, that’s still not a lot of evidence in favour of \(H_1\), is it?
Sunday standing in front of class, show the backs of the heads of the class
I agree! p-values down to 0.01 are never much more likely to occur under \(H_0\) relative to \(H_1\), even when we cheat to favour \(H_1\). Beware of \(p\)-values larger than this, as the evidence for \(H_1\) is not always as strong as it seems!
Similar image > What MPR value is enough for you to believe the result? That’s for you to decide. Some choose to interpret this ratio as an upper bound to the Bayes Factor, but that’s something for another time…
Thanks teacher!